Construction and validation of a clinical risk model based on machine learning for screening characteristic factors of lymphovascular space invasion in endometrial cancer

This study aimed to identify factors that affect lymphovascular space invasion (LVSI) in endometrial cancer (EC) using machine learning technology, and to build a clinical risk assessment model based on these factors. Samples were collected from May 2017 to March 2022, including 312 EC patients who received treatment at Xuzhou Medical University Affiliated Hospital of Lianyungang. Of these, 219 cases were collected for the training group and 93 for the validation group. Clinical data and laboratory indicators were analyzed. Logistic regression and least absolute shrinkage and selection operator (LASSO) regression were used to analyze risk factors and construct risk models. The LVSI and non-LVSI groups showed statistical significance in clinical data and laboratory indicators (P < 0.05). Multivariable logistic regression analysis identified independent risk factors for LVSI in EC, which were myometrial infiltration depth, cervical stromal invasion, lymphocyte count (LYM), monocyte count (MONO), albumin (ALB), and fibrinogen (FIB) (P < 0.05). LASSO regression identified 19 key feature factors for model construction. In the training and validation groups, the risk scores for the logistic and LASSO models were significantly higher in the LVSI group compared with that in the non-LVSI group (P < 0.001). The model was built based on machine learning and can effectively predict LVSI in EC and enhance preoperative decision-making. The reliability of the model was demonstrated by the significant difference in risk scores between LVSI and non-LVSI patients in both the training and validation groups.

www.nature.com/scientificreports/The European Society for Medical Oncology (ESMO)-modified classification, the Mayo model, and the GOG-99 model are commonly used to assess the low-risk classification of EC patients 8 .Of these, the ESMO-modified classification was shown to have the highest accuracy in predicting lymph node metastasis 9 , indicating that this model is a reliable tool that can help physicians more accurately assess patient risk of lymph node metastasis.The ESMO classification regards lymphovascular space invasion (LVSI) as a key factor for early-stage EC in risk stratification 10 , which closely links the presence of LVSI to the risk of lymph node metastasis in patients.Although the diagnosis of LVSI depends on postoperative pathological analysis, there are currently no effective biomarkers to determine LVSI status before or during surgery 11 .The intraoperative frozen section technique was also limited by time and sample size in the detection of LVSI, making it difficult to make an accurate identification 12 .Therefore, predictive models, as a statistical tool, have been widely used to simplify the clinical prediction process by graphing complex regression equations 13 .Combined predictive models have been used to estimate LVSI risk, and have demonstrated some accuracy 14,15 ; however, the existing prediction models for LVSI in EC are limited and mainly constructed based on pathological indexes.
Through the application of machine learning technology, this study aimed to screen out the key characteristic factors affecting LVSI in EC and build a clinical risk assessment model based on these factors.The construction and validation of this model may be of significance for improving the accuracy of LVSI prediction and clinical decision-making, while providing more accurate prognostic information for high-risk patients, with potentially important clinical implications.

Ethical statement
This retrospective study complied with the Declaration of Helsinki and was approved by the Ethics Committee of the Affiliated Hospital of Xuzhou Medical University, Lianyungang (2022-03).Because of the retrospective nature of this study, informed consent was waived with the approval of the Ethics Committee of the Affiliated Hospital of Xuzhou Medical University, Lianyungang.

Eligibility and exclusion criteria
Inclusion criteria: Patients who received treatment at the Xuzhou Medical University Affiliated Hospital of Lianyungang for the first time and were confirmed as EC by postoperative pathology.The surgical treatment included either total extra-fascial hysterectomy or radical hysterectomy plus bilateral adnexectomy, which may be supplemented by peritoneal irrigation, pelvic lymphadenectomy, and/or para-aortic lymphadenectomy as appropriate.The patient did not receive radiotherapy, chemotherapy, targeted therapy, or hormone therapy before surgery.The pathological findings of the patient's preoperative endometrial biopsy (including pathological types and histological grades), myometrial infiltration (determined by pelvic MRI), and tumor diameter (detected by hysteroscopy or gynecological color Doppler ultrasound) were consistent with the postoperative pathological results.
Exclusion criteria: Patients with malignancies in other systems, a history of coagulation dysfunction, autoimmune diseases, severe liver and kidney dysfunction, or those with incomplete clinical medical records were excluded.

Sample source
This study enrolled EC patients who received treatment at Xuzhou Medical University Affiliated Hospital of Lianyungang from May 2017 to March 2022.

Sample screening
A total of 408 eligible samples were screened according to the inclusion criteria, and 312 samples were enrolled after removing samples based on the exclusion criteria.

Sample grouping
The enrolled patients were divided into a training group (n = 219) and a validation group (n = 93) at a 7:3 ratio.They were subsequently divided into an LVSI group and a non-LVSI group based on the occurrence of LVSI.In the training group, there were 163 non-LVSI patients and 56 LVSI patients, whereas the validation group consisted of 72 non-LVSI patients and 21 LVSI patients.

Outcome measures
1.The differences in clinical data and laboratory indicators between the LVSI and non-LVSI groups in the training group were compared.2. Logistic regression was used to analyze the risk factors of LVSI in EC. 3. Least absolute shrinkage and selection operator (LASSO) regression was used to screen the characteristic factors of LVSI in EC. 4. Risk models were established using logistic regression and LASSO regression, and the difference in risk scores and predictive efficacy between LVSI and non-LVSI patients in the training group of the two models was compared.5.The clinical data and laboratory indicators were compared between the training and validation sets, and the difference in risk scores and predictive performance between the two models in the validation set were calculated.

Statistical analyses
The data were processed using SPSS 20.0 software.Data distribution was analyzed by the Kolmogorov-Smirnov (K-S) test.Measured data were described by means ± SD; the inter-and intra-group comparisons of normally

LASSO regression analysis
In the present study, we used LASSO regression to screen the characteristic factors leading to LVSI in EC.LASSO regression identified 22 characteristic factors when = λmin (0.0046564) and 19 characteristic factors when = λ0.1se(0.010757) (Fig. 1A).Considering the generalization performance of the model, 19 characteristic factors were identified when = λ0.1SEwas selected to construct the model.A total of 19 characteristic factors were screened out, including age, menarche, menopause, gravidity, parity, history of diabetes, myometrial infiltration depth, tumor diameter, pathological type, histological grade, cervical interstitial involvement, adnexal metastasis, FIGO staging, NEUT, LYM, MONO, ALB, FIB, and PLR (Fig. 1B).

Risk model construction
We constructed two risk models by logistic and LASSO regression.The logistic regression models were constructed using the β coefficient (Table 4), and the LASSO regression also used this coefficient construction model (Table 5).By comparison, we found that in both models, the risk score of patients in the LVSI group was significantly higher compared with that of patients in the non-LVSI group in the training group, with a statistical difference (P < 0.001, Fig. 2A).Through Delong test analysis, the area under the curve (AUC) of the risk model, which was built based on logistic regression was significantly lower compared with that of the LASSOconstructed model (P < 0.001, Fig. 2B).

Modeling validation
A comparison of patient baseline data between the training and validation groups revealed no statistical difference between them (P > 0.05, Table 6).We calculated the logistic and LASSO risk scores of patients in the validation group.LVSI patients had higher logistic and LASSO risk scores compared with the non-LVSI patients in the validation group, with significant statistical differences (P < 0.001, Fig. 3A).Subsequently, we found through the Delong test that the AUC of the logistic regression risk model was significantly lower compared with that of the LASSO risk model (P < 0.001, Fig. 3B, Tables 7, 8).

Discussion
Many scholars believe that LVSI is the basis for determining whether tumor cells have metastasized to lymph nodes 17 .LVSI is defined as at least one cluster of tumor cells that is observed to gather in the gap of the enveloping layer of flat endothelial cells and attach to the blood vessel wall by observing conventional pathological sections after surgery with an optical microscope 18 .The formation of LVSI is a complex pathophysiological process.When tumor cells penetrate the basement membrane and invade or penetrate into the surrounding tissues, it is usually accompanied by the invasion of interstitial lymphatic vessels and small vessels 11 .Tumor cells invading these lymphatic vessels or small blood vessels can form homopolymers, or bind to white blood cells and platelets to form heteropolymers, which results in intravascular cancer thrombosis 19 .Furthermore, these tumor thrombi float in blood vessels or lymphatic vessels and spread through the bloodstream or lymphatic system to all parts of the body to promote tumor metastasis.
Vol:.( 1234567890 www.nature.com/scientificreports/Clinical prediction models serve as essential quantitative tools for assessing risks and benefits, offering convenient, intuitive, and accurate insights for healthcare professionals and policymakers 20 .These models use multivariable regression analysis to integrate multiple predictors, enabling the quantitation of risks and the evaluation of prognosis for a variety of cancers.For example, Zuber et al. 21developed a model predicting OS in adrenocortical carcinoma patients, achieving AUCs between 0.68 and 0.72 for the training and validation sets for 5-years and 10-years OS.Similarly, another study 22 constructed a nomogram to predict 3-years survival post-radical resection in colon cancer patients, and achieving a high C-index of 0.918.In addition, an EC prognostic model 23 was formulated based on inflammatory response-related genes (IRGs), and 13 IRGs were identified as independent prognostic markers capable of predicting survival and response to chemotherapy and immunotherapy.In   www.nature.com/scientificreports/ the present study, logistic regression analysis pinpointed independent risk factors for LVSI in EC, including myometrial infiltration depth, cervical stromal invasion, LYM, MONO, ALB, and FIB.These factors underlie the link of LVSI to various biological processes and immune status.Increased myometrial infiltration depth suggests tumor spread to deeper tissues, heightening the risk of lymphatic vasculature invasion 24 .Cervical stromal invasion indicates local tumor spread, potentially increasing the risk of lymphatic system invasion.Alterations in LYM and MONO levels indicate immune response variations, possibly because of changes in the tumor microenvironment that facilitate tumor cell lymphatic dissemination 25 .Furthermore, changes in ALB and FIB levels may signal inflammation and coagulation mechanisms involved in tumor aggressiveness and metastasis 26,27 .Risk scores, calculated using β coefficients as risk indicators, identified LVSI patients in both training and validation groups with significantly higher scores compared with their non-LVSI counterparts.ROC curve analysis further affirmed the high predictive value of the logistic risk model, with AUCs of 0.946 and 0.850, respectively, which demonstrate the exceptional accuracy of the model at predicting LVSI in EC.
Although logistic regression models are efficient and robust in predicting LVSI in EC, they may not fully capture the complex, nonlinear relationships inherent to EC-LVSI.Recent studies, including Shao et al. 28 , have demonstrated the superiority of LASSO over logistic regression in disease prediction.For example, LASSO was superior for predicting diabetic foot ulcer progression in elderly diabetic patients.Similarly, we found that LASSO outperformed logistic regression in predicting LVSI in EC.LASSO identified 19 key factors related to LVSI, including patient age, menstrual history, tumor pathology, and hematological markers.This underscores the multifactorial nature of LVSI and the ability of LASSO to comprehensively capture these complex interactions via feature selection.The LASSO model consistently exhibited higher accuracy for both the training and validation groups compared with logistic regression.This aligns with the findings that LASSO can generate more accurate and robust prediction models by selecting robust features and eliminating non-contributory ones through coefficient regularization, thereby mitigating model overfitting.
This study has some limitations.Because it was single-center and retrospective, this study had some typical limitations, such as a small sample size and selection bias, which limits further improvement of the prediction efficiency of the model and affects the interpretation of the results.In the future, we will validate and optimize

Conclusion
Overall, this study demonstrated the feasibility of the LASSO method to establish a more accurate and reliable prediction model for LVSI in EC.It combines the clinical features, tumor characteristics, and hematological indicators of patients to reflect the high-risk factors of LVSI from multiple aspects.The application of this prediction model can help doctors identify patients with high LVSI risk earlier and adjust their treatment plan accordingly to obtain a better prognosis.

Figure 1 .
Figure 1.LASSO regression screening for characteristic factors of lymphovascular space invasion in endometrial cancer.(A) LASSO regression for the screening of characteristic factors leading to lymphovascular space invasion in endometrial cancer; (B) 19 characteristic factors when = λ.1se.

Figure 3 .
Figure 3.Comparison of risk scores and predictive efficiency between LVSI and non-LVSI patients in the validation group.(A) Comparison of the risk score of patients in the validation group calculated by logistic regression and LASSO regression; (B) ROC curve analysis of the AUCs of logistic regression and LASSO regression risk scores in predicting lymphovascular space invasion in endometrial cancer.Note LVSI: lymphovascular space invasion; ROC: receiver operating characteristic; AUCs: areas under the curves; C represents the non-LVSI group and P represents the LVSI group.

Table 1 .
Analysis of the difference in clinical data between LVSI and non-LVSI groups in the training group.BMI body mass index, FIGO International Federation of Gynecology and Obstetrics.

Table 2 .
Comparison of laboratory indicators between patients with and without LVSI in the training group.NEUT neutrophil count, LYM: lymphocyte count, MONO: monocyte count, PLT: platelet count, ALB: albumin, FIB: fibrinogen, PNI: prognostic nutritional index, NLR: neutrophil-to-lymphocyte ratio, MLR: monocyte-tolymphocyte ratio, PLR: platelet-to-lymphocyte ratio.Normally distributed data are expressed by means ± SD, while non-normally distributed data are expressed by median and interquartile range (IQR).

Table 5 .
LASSO characteristic variables.Comparison of risk scores and predictive efficiency between LVSI and non-LVSI patients in the training group.(A) Comparison of the patient scores in the training group calculated by logistic regression and LASSO regression.(B) ROC curve analysis of the AUCs of logistic regression and LASSO regression risk scores in predicting lymphovascular space invasion in endometrial cancer.Note LVSI: lymphovascular space invasion; ROC: receiver operating characteristic; AUCs: areas under the curves; C represents the non-LVSI group and P represents the LVSI group.

Table 6 .
Comparison of the clinical data between training and validation groups.BMI body mass index, FIGO International Federation of Gynecology and Obstetrics, NEUT neutrophil count, LYM lymphocyte count, MONO monocyte count, PLT platelet count, ALB albumin, FIB fibrinogen, PNI prognostic nutritional index, NLR neutrophil-to-lymphocyte ratio, MLR monocyte-to-lymphocyte ratio, PLR platelet-to-lymphocyte ratio.Normally distributed data are expressed by means ± SD and non-normally distributed data by IQR.